Random walks on networks: cumulative distribution of cover time 
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We derive an exact closed-form analytical expression for the distribution of the cover time for a 
random walk over an arbitrary graph. In special case, we derive simplified, exact expressions for 
the distributions of cover time for a complete graph, a cycle graph, and a path graph. An accurate 
approximation for the cover time distribution, with computational complexity of 0(2n), is also 
presented. The approximation is numerically tested only for graphs with n < 1000 nodes. 

PACS numbers: 



I. INTRODUCTION 

The random walk is a fundamental dynamic process 
which can be used to model random processes inherent 
to many important applications, such as transport in dis- 
ordered media [lL neuron firing dynamics [2J, spread- 
ing of diseases H or transport and search processes 

In this paper, we investigate random walks on graphs 
Q and derive exact expressions for the cumulative dis- 
tribution functions for three quantities of a random walk 
that play the most important role in the theory of random 
walks: (1) hitting time h%j (or first-passage time), which 
is the number of steps before node j is visited starting 
from node i; (2) commute time Ky = hij + hjf, and cover 
time, that is the number of steps to reach every node. 

Average hitting time, average commute time, and av- 
erage cover time have been recently studied in several 
papers. In [l(| the authors investigate random walks 
on complex networks and derive an exact expression for 
the mean first-passage time between two nodes. For each 
node the random walk centrality is introduced, which de- 
termines the relative speed by which a node can receive 
and spread information over the network in a random 
process. Using both numerical simulations and scaling 
arguments, the behavior of a random walker on a one- 
dimensional small- world network is studied in [Tl| . The 
average number of distinct sites visited by the random 
walker, the mean-square displacement of the walker, and 
the distribution of first-return times obey a characteristic 
scaling form. The expected time for a random walk to 
traverse between two arbitrary sites of the Erdos-Renyi 
random graph is studied in [12j. The properties of ran- 
dom walks on complex trees are studied in 13]. Both the 
vertex discovery rate and the mean topological displace- 
ment from the origin present a considerable slowing down 
in the tree case. Moreover, the mean first passage time 
displays a logarithmic degree dependence, in contrast to 
the inverse degree shape exhibited in looped networks 



[l3l |. The random walk on networks has also much rel- 
evance to algorithmic applications. The expected time 
taken to visit every vertex of connected graphs has re- 
cently been extensively studied. In a series of papers, 
Cooper and Frieze have studied the average cover time 
of various models of a random graph, see for example 

a 

This is an outline of the paper. In section [TT] we derive 
closed formulas of the cumulative distribution function 
for hitting time, commute time, and cover time; we also 
present a simple example of a graph with four nodes, 
and derive closed formulas of the cumulative distribution 
function for cover time of complete graphs, cycle and 
path graphs. An approximation of the cumulative distri- 
bution function for cover time is proposed in the section 
mi we also present some numerical results of the cu- 
mulative distribution function for cover time of different 
graphs [iVl We finish the paper with conclusions. 



II. EXACT RANDOM WALK DISTRIBUTIONS 
FOR HITTING TIME, COMMUTE TIME, AND 
COVER TIME 

Let G — (V, E) be a connected graph with n nodes 
and m edges. Consider a random walk on G: we start 
at a node vq] if at the i-th step we are at a node i>t, 
we move to neighbor of Vt with probability l/d(v t ), if 
an edge exists between node vt and it's neighbor, where 
d(v t ) is the degree of the node v t - Clearly, the sequence 
of random nodes (vt : t = 0, 1, . . .) is a Markov chain. 
We denote by M — (mij)ijEV f ne matrix of transition 
probabilities of this Markov chain: 



l/d(i), iiijeE 
0, otherwise, 



(1) 



where d(i) is the degree of the node i. Recall that the 
probability m\j of the event that starting at i, the random 
walk will be at node j after t steps, is an entry of the 
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matrix M*. It is well known that m\j — > d(j)/2m as 
t — * oo. 

We now introduce three quantities of a random walk 
that play the most important role in the theory of random 
walks: (1) hitting time hij is the number of steps before 
node j is first visited starting from node i; (2) commute 
time Kij — + hji is the number of steps in a random 
walk starting at i before node j is visited and then node 
i is reached again; and (3) cover time is the number of 
steps to reach every node. 

A. Hitting time 

We first calculate the probability mass function for the 
hitting time. To calculate the hitting time from i to j, 
we replace the node j with an absorbing node. Let Dj be 
a matrix such that dik = rriik for all k ^ j, and dij = 
for all i 7^ j and djj = 1. This means that the matrix 
Dj is obtained from M by replacing the original row j 
with the basis row-vector ej for which the j-th element 
is 1 and all other elements are 0. Let d\j be the ij entry 
of the matrix Dj , denoting the probability that starting 
from i the walker is in the node j by time t. Since j is an 
absorbing state, d\j is the probability of reaching j, orig- 
inating from i, in not more then t steps , i.e. d\j is the 
cumulative distribution function (CDF) of hitting time. 
Note that the j— th column of the matrix Dj approaches 
the all 1 vector, as t — > oo. The probability mass func- 
tion of the hitting time hij to reach j starting from i is, 
therefore, given by 

Ph i At) = d t ij -4j\ t>i 

Let E x . be the event of reaching the node Xj starting 
from the node i ^ Xj by time t. Consider a sequence 
of events {E*. 1 ,E* 3 , . . . ,E*. }. What is the probability 
of the event: starting from node i, the walker visits one 
of the nodes {x\, X2, ■ ■ ■ , Xf-} by time tl Obviously, it is 
the probability of the union Uj =i E x .. To calculate this 
probability, we replace the nodes {xx, X2, ■ ■ ■ , with 
absorbing nodes. Let D x be a matrix obtained from M 
by replacing the rows {x%, X2, ■ ■ ■ , Xk} with the basis row- 
vectors e Xl , e X2 , . . . , e Xk , respectively. Let d\ x . be the ixj 

entry of the matrix D l x . Y]j—i df^. . is the probability 
that starting from i we reach for the first time one of the 
{xi, X2, ■ ■ ■ , Xk} nodes in < t steps. Therefore, 

fe 

F Xl ,..., Xk {t) = Y,d\ Xj (2) 

3=1 

is the cumulative distribution function (CDF) of the hit- 
ting time hi x = t of the union of events. The probability 
of reaching one of the nodes {x\, X2, ■ ■ ■ , x n }, starting 
from i, in the t-th step is given by 

Ph ix (t) = F Xl _ Xk (t) - F X \,...,xk (* - 1), t > 1 

which actually gives the probability mass function (PMF) 
of hitting time hi x of the union U^ =1 S* . . 



B. Commute time 

Probability mass function of the commute time Kij = 
hij + hji is obtained as the convolution of PMFs of the 
two random variables hij and hji : 

t 

(*) = Vha (*) *Ph Si (t) = Phu { T )Pha (t-r). 

T = l 

The cumulative distribution function of the commute 
time can also be derived as follows: we copy our Markov 
chain and we modify the original Markov chain by delet- 
ing all outgoing edges of the node j, we modify the orig- 
inal Markov chain by deleting all outgoing edges of the 
node j and we modify the copied Markov chain by replac- 
ing all outgoing edges of the node i' (which is a copy of 
the node i of the original Markov chain) with a self-loop. 
We then connect the two chains by adding one directed 
edge from node j to its copy j' of the copied chain. Let O 
be n x n matrix of all 0s, Oj = (om) be the nx n matrix 
for which all elements are except Ojj = 1, and D* be 
the matrix obtained from M by replacing the j— th row 
with all 0. Define the 2n x 2n matrix C as 



The matrix C is a transition matrix of the modified 
Markov chain with 2n elements (original Markov chain 
and its copy). Let c* i+n be the + n) element of the 
matrix C*. This element is the cumulative distribution 
function for the commute time Ka — 1. 



C. Cover time 

Cover time is defined as the number of steps to reach 
all nodes in the graph. In order to determine the CDF of 
the cover time, we consider the event H™, A i,El, , and 
use the well known equation for the inclusion-exclusion 
of multiple events 

, n \ n 

H n < = e 

^ k=X,k^z ' i=l,i^ z 
n n 

- E E p{E t x ^Ei.)+...+ 

+ (-ir- 1 P(E t Xi UE t X2 U...UE t x J, 

From the last equation and equation (J2J), we determine 
the cumulative distribution function of the cover time as 

n n n 

Fcover(t) = £ F Xi (t) - ^ ^ F Xi<Xj (t) + 

... + (-l)"- 1 ^ 1 ,, 2 ,...,,„W. (3) 
where z is the starting node of the walk. 




Equation ([3|) is the main result of this paper. The 
probability mass function of the cover time can be easily 
computed from the Eq. We note that Eq. ([3]) is 

practically applicable only for small values of n; in fact 
the computational complexity of Eq. ((3|) at a single time 
step is: 

n . 

y n - =2"- i -i. 

D. An Example 

We now present a simple example to illustrate our re- 
sults. Consider a random walk on a network with four 
nodes, see Fig.Q] such that the matrix of transition prob- 
abilities of the corresponding Markov chain is given by 



" 


1/3 


1/3 1/3 


1/2 





1/2 


1/3 


1/3 


1/3 


L 1/2 





1/2 



Let be the (i,j)-th element of the matrix M f . Since, 
in this example, M is a 4 x 4 matrix, one can compute an- 
alytically, using for example the software package Math- 
cmatica, the elements of the matrix M l . Thus, it can be 
found, for example, 

™;. = i(i -<-!>• (I)'), 



which is the probability that the walker starting from i = 
1 at the time t is in j = 4. Note that lim^oo m| 4 = 1/5. 

To compute the hitting time to reach the node 4 start- 
ing from an arbitrary node, we modify the existing ran- 
dom walk to the random walk shown on Fig. O The 
transition matrix of the modified walk is: 

" 1/3 1/3 1/3- 

_ 1/2 1/2 

4 ~ 1/3 1/3 1/3 ' 

. 1 

Let dlj be the elements of the matrix D*. Again the el- 
ements of the matrix £)* can be computed analytically. 
For example, the probability of reaching the node 4 start- 
ing form 1 in time steps < t is equal to 




Clearly, as for any cumulative distribution, lim^oo <ij 4 = 
1. 

Let us now compute the probability starting from node 
1 to reach for the first time the node 4 and then to reach 
1 for the first time starting from 4 in time t. For this, 
we consider the modified random walk shown on Fig. [31 
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with the transition matrix given by: 

1/3 1/3 1/3 

1/2 1/2 

1/3 1/3 1/3 



1 









c 



1/2 1/2 
1/3 1/3 1/3 
1/2 1/2 



The element c* 5 of the matrix C is the cumulative dis- 
tribution function of the commute time Ku — 1 and it is 
given by: 



4 



13 



13 x 2*3* /2 (3 + 2V3 + 4 x 3* 

1 + VTsj (65 + 19VT3) 
1 - n/13W-65 + 19vT3) 
(_2)* 3 */2 (39_26\/3 



(4) 



Notice that again lim^oo c* 5 = 1. 

As the last example, we consider the probability of 
reaching the node 4 or the node 2 from the node 1 for 
the first time in time steps t. The modified random walk 
is shown on Fig. [5] and the transition probability matrix 
of the modified walk is given by the matrix D^, which 
has the form: 



£>4;2 = 



1/3 1/3 1/3 

10 

1/3 1/3 1/3 

1 



The elements d\ 2 and d\ 4 of the matrix D\. 2 are 

~ ~ 1 
^12 = "14 = 2 2~ 

The cumulative distribution function of the event: the 
node 2 or the node 4 is reached from the node 1 for the 
first time by the t-th. step, is given by d\ 2 + d 



14- 




3' 



3 



FIG. 3: Modified random walk for computing the commute 
time starting from node 1 to reach for the first time the node 
4 and then to reach 1 for the first time starting from 4 



E. Cover time for complete, cycle and path graph 




In this subsection we derive exact expressions for the 
CDF of cover time for three particular graphs: complete, 
cycle, and path graph. 



FIG. 4: Modified random walk for computing the probability 
of reaching the node 4 or the node 2 from arbitrary node 



1. Complete graph 

A complete graph is a simple graph in which every 
pair of distinct vertices is connected by an edge. The 
complete graph on n vertices has n vertices and n(n— 1)/2 



edges, and is denoted by K n . We can now easily derive 
analytical results for the PMF of a complete graph. It is 
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FIG. 6: Path graph with 7 nodes 



101 



FIG. 5: Cycle graph with 12 nodes 
easy to see that for the complete graph we have 

p(K<) = i- 

P(E t Xi ^E t x .) = 1- 
P(E t Wt UE t Xj UE t n ) = 1- 



(n — 


2Y 


(n — 


ly 


(n — 


3)* 


(n — 


i) ( 


(n — 


4)* 


(n — 


1)* 



Therefore, 



(n-k-iy 
(n - 1)' 



Thus, the cumulative distribution function of the cover 
time for complete graph with n nodes can be expressed 
as 



n-l 



7!r(n — 7) V ( n ~ i) 



7=1 



71(71 — 7 — 1) 
^} ' 7!r(n-7)V (n-l)* 



7=1 



Therefore, the probability mass function is: 

-1 r(n-i) 



/ c (t) = E(- 1 ) 

7=1 



r( 7 )r(n-7) 



n — 1 



a closed chain. Let us denote the cycle graph with n 
vertices as C n . The number of vertices in a C n equals 
the number of edges, and every vertex has degree 2; that 
is, every vertex has exactly two edges incident with it. 
An example of a cycle graph with 12 nodes is given in 
Fig. El 

Let us assume that the first node of the cycle graph 
is the starting node of the walk. We need to find the 
intersection of the events of reaching nodes 2,3, ... to n. 
These events form a path. A path in a graph is a sequence 
of vertices such that, from each of its vertices, there is 
an edge to the next vertex in the sequence. A cycle is a 
path such that the start vertex and end vertex are the 
same. Note that the choice of the start vertex in a cycle is 
arbitrary By exploiting the Remark of Corollary 3.1.16 
given in (Tt| . and proved in (l9j for events that form a 
path, we find that the cumulative distribution function 
of the cover time for a cycle graph is: 

n n—1 

Fcover(t) = E P ( E i) - E P ( E i U ^+0 ■ 



3. Path graph 

A path graph is a particularly simple example of a tree, 
namely one which is not branched at all, that is, contains 
only nodes of degree two and one. In particular, two of its 
vertices have degree 1 and all others (if any) have degree 
2. An example of a path graph with 7 nodes is given in 
Fig. E 

To find the cumulative distribution function of the 
cover time for a path graph we note that all the nodes 
will be covered if the first and the last nodes are reached 
by the random walker. Therefore, the cumulative distri- 
bution function of the cover time for a path graph is 



Fr- 



it) 



p (El n El) 

P {E\)+P (£* 



P (El U El) 



We note that if the first node is the starting node then 
P (E\ n E l n ) = P (E^) and if the last node is the starting 
node then P (E[ (1 E l n ) = P (E[). 



III. APPROXIMATION OF THE CDF OF 
COVER TIME 



2. Cycle graph 

A cycle graph is a graph that consists of a single cycle, 
or in other words, some number of vertices connected in 



The cumulative distribution functions for hitting and 
commute time can be computed for reasonable large 
graphs. The complexity of matrix multiplication, if car- 
ried out naively, is 0(n 3 ), but more efficient algorithms 
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do exist, which are computationally interesting for ma- 
trices with dimensions n > 100 [l5j . 

The inclusion-exclusion formula ([3]) has little practical 
value in graphs with large number of nodes since it then 
requires extensive computational times. In the following, 
we present an accurate and useful approximation of ([3]) 
that can be evaluated in a reasonable time. The first 
inequality for the inclusion-exclusion was discovered by 
Ch. Jordan [iH | and from then until now a lot of work 
has been done in sharpening the bounds or the approxi- 
mation. An excellent survey of the various results for the 
inclusion-exclusion is given in [l7j . 

We propose the following approximation for inclusion- 
exclusion formula: 



k=l 



n-l 



n 



P{E Xi nE Xi+1 ) 

\ P(E Xt )P(E Xz+1 ) 



(5) 



where P{E Xi n E Xi+1 ) = P{E Xi ) + P(E Xt+1 ) - P{E Xi U 
E Xi+1 ). The node indexes must be arranged in such way 
that there exists an edge between nodes Xi and iCj+i. 
This condition is not strict, and there can exist a small 
number of nodes that do not satisfy this condition. The 
Appendix presents the heuristic derivation of ([5]) by using 
the method proposed in (l8j . 

As can be seen, the single-step computational com- 
plexity of Eq. is 0(2n). The proposed approxima- 
tion is very accurate for strongly connected graphs like 
the complete graph and is less accurate for poorly con- 
nected graphs like the path graph, as can be seen from 
the figures below. We note that the error of the approxi- 
mation for the path graph is the upper bound of the error 
when the middle node is the starting node. This is due 
to the fact that the proposed approximation equation re- 
duces to the exact equation for independent events, while 
diminishing as the events become more and more depen- 
dent. When almost all the nodes are subset of the rest 
of the nodes then the approximation formula is the least 
accurate. Thus the formula is the least accurate when is 
applied to a path graph with n nodes and the walk starts 
from the middle, n/2-th node (assuming that n is an even 
number). In this case the event of reaching nodes 1 to 
ri/2 — 1 is the event of reaching node 1 and the event of 
reaching nodes n/2 + 1 to n is the event of reaching node 

71. 

Interestingly, if we start changing the starting node 
and evaluate the error of the approximation for a path 
graph, then when the starting node approaches the first 
or the n-th node the formula becomes more and more 
accurate and when the starting node is the first or the 
last node, then the approximation formula reduces to the 
exact formula. To prove this we let the first node be the 
starting node, then the event of reaching node i and node 
j,i£j>i is P(Ej n E]) = P{E)). Thus the approxima- 




40 nodes, starting node is the 20-th node 
30 nodes, starting node is the 4-th node . 



4000 
time step 



8000 



FIG. 7: Exact and the approximate formula of the CDF for 
a path graph with two different starting nodes 



tion formula is given by 



k=l 



P{E n ) (6) 



A similar proof is when the n-th node is the starting 
node. An example is given in FiglT] where the CDF of 
a path graph is given by the exact and the approximate 
formula for two path graphs with 30 and 40 nodes when 
starting node is the 4-th and the 20-th node, respectively. 
The second worst case error is when the approximation 
formula is applied to a cycle graph and in this case the 
error is independent of the starting node. 



IV. NUMERICAL EXAMPLES 

In this section, several numerical examples are pre- 
sented. First, we validate the cover time formula and the 
approximation by Monte Carlo simulations, Fig. [8l for a 
Erdos-Renyi random graph with 20 nodes. Fig. [5Jl illus- 
trates the CDF, while Fig. [Sb the PDF of cover time. We 
illustrate the accuracy of the approximation for a path 
graph and a complete graph, Figures [9] and [10] respec- 
tively, where the starting node of the walk for the path 
graph is the middle node for both FiguresEK and[9b- We 
have performed various numerical simulations of the cu- 
mulative distribution functions using exact and approx- 
imate expressions for complete, path and cycle graphs 
with up to 1000 nodes. For small n (n < 1000) we found 
that increasing n to up to 1000, the accuracy of the ap- 
proximation is maintained. We believe that Eq. ([5]) is a 
good approximation for cumulative distribution of cover 
time even for larger graphs, but since at the moment we 
do not have estimates for accuracy of our approximation, 
we leave this as a subject of our next research. More de- 
tailed analysis on how the CDF of cover time depends on 
graph topology will be discussed in a forthcoming paper. 
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(a) (a) 




FIG. 8: Analytical, approximated and simulated (a) cumula- 
tive distribution function and (b) probability mass function 
of cover time for a random graph with 20 nodes 



FIG. 9: Exact vs approximated CDF for (a) a path graph 
with 50 nodes and (b) a path graph with 200 nodes 



V. CONCLUSIONS 

In this paper we have derived the exact closed-form 
expressions for the PMF and CDF of three random walk 
parameters that play pivotal role in the theory of ran- 
dom walks: hitting time, commute time, and cover time. 
We also have derived simpler closed formulas for the cu- 
mulative distribution function of cover time for complete, 
cycle and path graphs. An approximation of the cumula- 
tive distribution function for cover time is proposed, and 
several numerical results for the CDF of cover time for 
different graphs are presented. 



Appendix 

If A is the union of the events Ai, A2, . . . , A n then, 
writing pi for the probability of Ai, pij for the probability 
of Ai n Aj , pijk for the probability of Ai n Aj n Ak etc, 



the probability of A is given by 

P( A ) =^2Pi~^2Pij+ X! P V k -••• + (- l)" _1 pl2...n 
i i<j i<j<k 

The inclusion - exclusion principle tells us that if we 
know the Pi,Pij,Pijk ■ ■ ■ then we can find P(A). However, 
in practice we are unlikely to have full information on the 
Pi, Pij, Pijk • • •■ Therefore, we are faced with the task of 
approximating P{A) taking into account whatever par- 
tial information we are given. In certain cases where the 
events Ai are in some sense close to being independent, 
then there are a number of known results approximat- 
ing P{A). In this paper we use the following result [[l8|, 
equation (9)]: 

n n n 

p(A)«i-nte)n n (?«) ( ? ) 

i— 1 i— 1 
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(a) 



Replacing (fTU)) and Bi = Ai into the (fTTj) , we get: 




400 600 
time step 

(b) 



1000 



,x 10 




400 600 
time step 



800 



1000 



FIG. 10: Exact vs approximated CDF (a) and PMF (b) for 
complete graph with 50 nodes 



where 



?< = P(Ai) 



q,j 



P(A t )P(A 3 ) 



(8) 
(9) 



Let the event Bi be defined as Bi = Ai. Then B = 
Ur=i Bi — Ur=i ^ ano - the probability of this event is: 



P(B) =p({jA i )=l-P(f)A i 



The approximated form (J7J) of the event -B is: 



(10) 



^)«i-nm)nnSl ^ 



(n \ n n n D , . . •, 

nfnwnn^. <»> 
i=l / i=l t=lj=H+l v ; v 3; 

where P(Aj n Aj) can be expressed as: 



P(A, (lAj) = P(Ai) + P{A,j) - P{A t U Aj) 



When the events Ai and Aj are not close to being inde- 
pendent but on the contrary, one of the events is a subset 
of the other, as the case for the events in the cover time 
formula, the approximation formula (|12p is not accurate. 



The inaccuracy can be seen from the following exam- 
ple: Let the events Aj for j > i are all subsets of the 
event Ai . Then the probability of the event Ai n Aj is 



P(AinAj)=P(Aj) 



if we now replace this expression in (|12|) we get 



p «n^)n 



Then if n is large and the probabilities P(A{) are a very 
small numbers, this probability expression can be a num- 
ber much bigger then one. 

One way to solve the accuracy problem is not to take 
the second product over all node pairs, but just over n — 
1 different neighboring pairs. We suggest the following 
approximation for the cumulative distribution of cover 
time: 



We note that this approximate probability expression re- 
duces to the exact probability expression in the two lim- 
iting cases: first, when all events are mutually indepen- 
dent, and second, when all events are subset of just one 
event. The first claim can be proved just by noting that 
P(Ai n A i+ i) = P(Ai)P(A i+ i) and the second claim was 
previously proved, see equation © when the events Ei 
for i = 1, . . . , n — 1 are all subsets of the event E n . 
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